14 research outputs found

    Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

    Get PDF
    In this paper, a semisupervised speech data extraction method is presented and applied to create a new dataset designed for the development of fully bilingual Automatic Speech Recognition (ASR) systems for Basque and Spanish. The dataset is drawn from an extensive collection of Basque Parliament plenary sessions containing frequent code switchings. Since session minutes are not exact, only the most reliable speech segments are kept for training. To that end, we use phonetic similarity scores between nominal and recognized phone sequences. The process starts with baseline acoustic models trained on generic out-of-domain data, then iteratively updates the models with the extracted data and applies the updated models to refine the training dataset until the observed improvement between two iterations becomes small enough. A development dataset, involving five plenary sessions not used for training, has been manually audited for tuning and evaluation purposes. Cross-validation experiments (with 20 random partitions) have been carried out on the development dataset, using the baseline and the iteratively updated models. On average, Word Error Rate (WER) reduces from 16.57% (baseline) to 4.41% (first iteration) and further to 4.02% (second iteration), which corresponds to relative WER reductions of 73.4% and 8.8%, respectively. When considering only Basque segments, WER reduces on average from 16.57% (baseline) to 5.51% (first iteration) and further to 5.13% (second iteration), which corresponds to relative WER reductions of 66.7% and 6.9%, respectively. As a result of this work, a new bilingual Basque–Spanish resource has been produced based on Basque Parliament sessions, including 998 h of training data (audio segments + transcriptions), a development set (17 h long) designed for tuning and evaluation under a cross-validation scheme and a fully bilingual trigram language model.This work was partially funded by the Spanish Ministry of Science and Innovation (OPEN-SPEECH project, PID2019-106424RB-I00) and by the Basque Government under the general support program to research groups (IT-1704-22)

    Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation

    Get PDF
    [Abstract] The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority area nowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speech repository given a spoken query. Research on this area is continuously fostered with the organization of QbE STD evaluations. This paper presents a multi-domain internationally open evaluation for QbE STD in Spanish. The evaluation aims at retrieving the speech files that contain the queries, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: MAVIR database, which comprises a set of talks from workshops; RTVE database, which includes broadcast television (TV) shows; and COREMAH database, which contains 2-people spontaneous speech conversations about different topics. The evaluation has been designed carefully so that several analyses of the main results can be carried out. We present the evaluation itself, the three databases, the evaluation metrics, the systems submitted to the evaluation, the results, and the detailed post-evaluation analyses based on some query properties (within-vocabulary/out-of-vocabulary queries, single-word/multi-word queries, and native/foreign queries). Fusion results of the primary systems submitted to the evaluation are also presented. Three different teams took part in the evaluation, and ten different systems were submitted. The results suggest that the QbE STD task is still in progress, and the performance of these systems is highly sensitive to changes in the data domain. Nevertheless, QbE STD strategies are able to outperform text-based STD in unseen data domains.Centro singular de investigación de Galicia; ED431G/04Universidad del País Vasco; GIU16/68Ministerio de Economía y Competitividad; TEC2015-68172-C2-1-PMinisterio de Ciencia, Innovación y Competitividad; RTI2018-098091-B-I00Xunta de Galicia; ED431G/0

    ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

    Get PDF
    [Abstract] Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a speech repository given a textual representation of a search term (which can include one or more words). This paper presents a multi-domain internationally open evaluation for STD in Spanish. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation task aims at retrieving the speech files that contain the terms, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: the MAVIR database, which comprises a set of talks from workshops; the RTVE database, which includes broadcast news programs; and the COREMAH database, which contains 2-people spontaneous speech conversations about different topics. We present the evaluation itself, the three databases, the evaluation metric, the systems submitted to the evaluation, the results, and detailed post-evaluation analyses based on some term properties (within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native/foreign terms). Fusion results of the primary systems submitted to the evaluation are also presented. Three different research groups took part in the evaluation, and 11 different systems were submitted. The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain.Ministerio de Economía y Competitividad; TIN2015-64282-R,Ministerio de Economía y Competitividad; RTI2018-093336-B-C22Ministerio de Economía y Competitividad; TEC2015-65345-PXunta de Galicia; ED431B 2016/035Xunta de Galicia; GPC ED431B 2019/003Xunta de Galicia; GRC 2014/024Xunta de Galicia; ED431G/01Xunta de Galicia; ED431G/04Agrupación estratéxica consolidada; GIU16/68Ministerio de Economía y Competitividad; TEC2015-68172-C2-1-

    An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies

    Get PDF
    Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.This work was partially supported by Radio Televisión Española through the RTVE Chair at the University of Zaragoza, and Red Temática en Tecnologías del Habla (RED2022-134270-T), funded by AEI (Ministerio de Ciencia e Innovación); It was also partially funded by the European Union’s Horizon 2020 research and innovation program under Marie Skłodowska-Curie Grant 101007666; in part by MCIN/AEI/10.13039/501100011033 and by the European Union “NextGenerationEU”/ PRTR under Grants PDC2021-120846C41 PID2021-126061OB-C44, and in part by the Government of Aragon (Grant Group T3623R); it was also partially funded by the Spanish Ministry of Science and Innovation (OPEN-SPEECH project, PID2019-106424RB-I00) and by the Basque Government under the general support program to research groups (IT-1704-22), and by projects RTI2018-098091-B-I00 and PID2021-125943OB-I00 (Spanish Ministry of Science and Innovation and ERDF) as well

    Aportaciones teóricas y prácticas a las tecnologías del habla: Sautrela, un entorno de desarrollo

    Get PDF
    404 p.Las Tecnologías del Habla (TH) abarcan un amplio conjunto de áreas de investigación. Sin embargo, gran parte de estas disciplinas carece de las herramientas software que permitan implementar de una manera sencilla nuevas metodologías o diseños. El investigador se convierte así en un programador que debe modificar una y otra vez su herramienta de trabajo. Debido tanto a la complejidad que supone como a los problemas que acarrea la modificación de software de terceros, en los grupos de investigación surge a menudo la necesidad de crear un software propio que ofrezca un mayor conocimiento y control sobre la herramienta de trabajo.La presente memoria detalla la estructura del entorno de desarrollo Sautrela, que ha sido desarrollado íntegramente por el aspirante, siendo tanto un objetivo en sí mismo como un elemento de utilidad en toda su trayectoria investigadora, que abarca numerosos trabajos resumidos igualmente en este documento. Sautrela es además una herramienta clave para la actividad del Grupo de Trabajo en Tecnologías Software, del que el aspirante fue cofundador, y se ofrece como software de código abierto a la comunidad científica. Esta herramienta define un conjunto versátil y extensible de componentes orientado al desarrollo de sistemas de reconocimiento de patrones, y especialmente enfocado a las TH. Su naturaleza modular y altamente configurable ofrece al investigador la oportunidad de abordar un gran número de problemas sin necesidad de añadir una sola línea de código. Sautrela es también un entorno extensible que permite integrar de manera sencilla nuevos componentes desarrollados por terceros

    Aportaciones teóricas y prácticas a las tecnologías del habla: Sautrela, un entorno de desarrollo

    No full text
    404 p.Las Tecnologías del Habla (TH) abarcan un amplio conjunto de áreas de investigación. Sin embargo, gran parte de estas disciplinas carece de las herramientas software que permitan implementar de una manera sencilla nuevas metodologías o diseños. El investigador se convierte así en un programador que debe modificar una y otra vez su herramienta de trabajo. Debido tanto a la complejidad que supone como a los problemas que acarrea la modificación de software de terceros, en los grupos de investigación surge a menudo la necesidad de crear un software propio que ofrezca un mayor conocimiento y control sobre la herramienta de trabajo.La presente memoria detalla la estructura del entorno de desarrollo Sautrela, que ha sido desarrollado íntegramente por el aspirante, siendo tanto un objetivo en sí mismo como un elemento de utilidad en toda su trayectoria investigadora, que abarca numerosos trabajos resumidos igualmente en este documento. Sautrela es además una herramienta clave para la actividad del Grupo de Trabajo en Tecnologías Software, del que el aspirante fue cofundador, y se ofrece como software de código abierto a la comunidad científica. Esta herramienta define un conjunto versátil y extensible de componentes orientado al desarrollo de sistemas de reconocimiento de patrones, y especialmente enfocado a las TH. Su naturaleza modular y altamente configurable ofrece al investigador la oportunidad de abordar un gran número de problemas sin necesidad de añadir una sola línea de código. Sautrela es también un entorno extensible que permite integrar de manera sencilla nuevos componentes desarrollados por terceros

    Reconocimiento de la Lengua en Albayzin 2010 LRE utilizando características PLLR

    Get PDF
    Los así denominados Phone Log-Likelihood Ratios (PLLR), han sido introducidos como características alternativas a los MFCC-SDC para sistemas de Reconocimiento de la Lengua (RL) mediante iVectors. En este artículo, tras una breve descripción de estas características, se proporcionan nuevas evidencias de su utilidad para tareas de RL, con un nuevo conjunto de experimentos sobre la base de datos Albayzin 2010 LRE, que contiene habla multi-locutor de banda ancha en seis lenguas diferentes: euskera, catalán, gallego, español, portugués e inglés. Los sistemas de iVectors entrenados con PLLRs obtienen mejoras relativas significativas respecto a los sistemas fonotácticos y sistemas de iVectors entrenados con características MFCC-SDC, tanto en condiciones de habla limpia como con habla ruidosa. Las fusiones de los sistemas PLLR con los sistemas fonotácticos y/o sistemas basados en MFCC-SDC proporcionan mejoras adicionales en el rendimiento, lo que revela que las características PLLR aportan información complementaria en ambos casos.Phone Log-Likelihood Ratios (PLLR) have been recently proposed as alternative features to MFCC-SDC for iVector Spoken Language Recognition (SLR). In this paper, PLLR features are first described, and then further evidence of their usefulness for SLR tasks is provided, with a new set of experiments on the Albayzin 2010 LRE dataset, which features wide-band multi speaker TV broadcast speech on six languages: Basque, Catalan, Galician, Spanish, Portuguese and English. iVector systems built using PLLR features, computed by means of three open-source phone decoders, achieved significant relative improvements with regard to the phonotactic and MFCC-SDC iVector systems in both clean and noisy speech conditions. Fusions of PLLR systems with the phonotactic and/or the MFCC-SDC iVector systems led to improved performance, revealing that PLLR features provide complementary information in both cases.This work has been supported by the University of the Basque Country under grant GIU10/18 and project US11/06 and by the Government of the Basque Country under program SAIOTEK (project S-PE12UN55). M. Diez is supported by a research fellowship from the Department of Education, Universities and Research of the Basque Country Government

    Verificación de la lengua en conversaciones telefónicas y en informativos de televisión (GLOSA)

    Get PDF
    En esta breve comunicación presentamos el proyecto GLOSA, financiado por el Gobierno Vasco durante el bienio 2010-2011. El proyecto plantea, entre otros, los siguientes objetivos tecnológicos: (1) crear una infraestructura adecuada para desarrollar y evaluar nuevos métodos de verificación de la lengua; y (2) preparar un sistema competitivo de verificación de la lengua para señales telefónicas con objeto de presentarlo a la NIST 2011 Language Recognition Evaluation. Desde el punto de vista académico, el objetivo más importante del proyecto es la implementación y mejora de las técnicas actuales de verificación de la lengua.In this brief communication we present the project GLOSA, financed by the Government of the Basque Country for the period 2010-2011. The project has two main technological objectives: (1) creating a suitable infrastructure for the development and evaluation of language recognition technologies; and (2) preparing a competitive language recognition system for conversational telephone speech, which will be eventually presented to the NIST 2011 Language Recognition Evaluation. From an academic point of view, the project aims to implement and improve state-of-the-art technology.This project has been supported by the Government of the Basque Country under program SAIO-TEK (project S-PE10UN87) and by the University of the Basque Country under grant GIU10/18

    Sistema de recuperación de noticias de televisión en castellano y euskera

    Get PDF
    El sistema de indexado y búsqueda de contenidos multimedia que se presenta en este trabajo (Hearch) es un buscador de aspecto convencional pero con la capacidad de devolver segmentos de vídeo gracias a la transcripción automática de sus contenidos de voz. El sistema consta de un back-end que capta, procesa e indexa los recursos, y de un front-end que permite realizar búsquedas y configurar y monitorizar el funcionamiento de los distintos módulos, mediante una interfaz web. Actualmente se encuentra operativa una versión de la herramienta que trabaja frente a repositorios de noticias en castellano y euskera (http://gtts.ehu.es/Hearch/). Para evaluar el rendimiento del sistema se dispone de 6 programas de noticias en castellano y 7 en euskera. Puesto que el módulo de Reconocimiento Automático del Habla introduce bastantes errores, se ha propuesto y evaluado una aproximación basada en añadir términos afines a los de la pregunta para ampliar los resultados proporcionados por el sistema. Como resultado se obtiene una pequeña mejora del rendimiento.This paper presents a spoken document retrieval system (Hearch) looking like a conventional search tool, which retrieves audio/video segments based on the automatic transcription of speech contents. The system consists of a back-end that captures, processes and indexes audio/video resources, and a front-end that allows to search contents, configure various modules and display performance statistics through a web interface. An early version of this tool is available (http://gtts.ehu.es/Hearch/), which searches and retrieves segments on TV broadcast news repositories in Spanish and Basque. To evaluate the performance of the system, six manually transcribed TV broadcast news in Spanish and seven in Basque have been used. An approach based on extending the query with the so called friendly terms has been proposed and evaluated, attempting to minimize the effect of errors introduced by the Automatic Speech Recognition module. This approach led to slight performance improvements.This work has been supported by the University of the Basque Country under grant GIU10/18, by the Government of the Basque Country under program SAIOTEK (project S-PE10UN87) and by the Spanish MICINN under Plan Nacional de I+D+i (project TIN2009-07446, partially financed by FEDER funds). M. Diez is supported by a research fellowship from the Department of Education, Universities and Research of the Basque Country Government

    Search on speech from spoken queries: the multi-domain International ALBAYZIN 2018 query-by-example spoken term detection evaluation

    Get PDF
    The huge amount of information stored in audio and video repositories makes search on speech (SoS) a priority areanowadays. Within SoS, Query-by-Example Spoken Term Detection (QbE STD) aims to retrieve data from a speechrepository given a spoken query. Research on this area is continuously fostered with the organization of QbE STDevaluations. This paper presents a multi-domain internationally open evaluation for QbE STD in Spanish. Theevaluation aims at retrieving the speech files that contain the queries, providing their start and end times, and a scorethat reflects the confidence given to the detection. Three different Spanish speech databases that encompassdifferent domains have been employed in the evaluation: MAVIR database, which comprises a set of talks fromworkshops; RTVE database, which includes broadcast television (TV) shows; and COREMAH database, which contains2-people spontaneous speech conversations about different topics. The evaluation has been designed carefully sothat several analyses of the main results can be carried out. We present the evaluation itself, the three databases, theevaluation metrics, the systems submitted to the evaluation, the results, and the detailed post-evaluation analysesbased on some query properties (within-vocabulary/out-of-vocabulary queries, single-word/multi-word queries, andnative/foreign queries). Fusion results of the primary systems submitted to the evaluation are also presented. Threedifferent teams took part in the evaluation, and ten different systems were submitted. The results suggest that theQbE STD task is still in progress, and the performance of these systems is highly sensitive to changes in the datadomain. Nevertheless, QbE STD strategies are able to outperform text-based STD in unseen data domains.Xunta de Galicia | Ref. ED431G/01Xunta de Galicia | Ref. ED431G/04Ministerio de Economía y Competitividad | Ref. TEC2015-68172-C2-1-PAgencia Estatal de Investigación | Ref. RTI2018-098091-B-I0
    corecore